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(54) Title: DATA BACKUP AND RECOVERY SYSTEMS 

(57) Abstract 

l n a network 
environment (200), multiple 
clients (210) and multiple 
servers (230) arc connected 
via a local area network 
(LAN) (220) to a tape backup 
apparatus (240). Each 
client (210) and wch server 
is provided with backup 
agent software (215), which 
schedules backup operations 
on the basis of time since the 
last backup, the amount of 
information generated since 
the last backup, or the like. 
An agent (215a) also sends 
a request to the tape backup 
apparatus (240), prior to 
an actual backup, including 
information representative 
of the files that it intends to 
back up. The tape backup 
apparatus (240) is provided 
with a mechanism to receive 
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or the like l-he tape backup apparatus (240) is further prov.ded with mechan.sm ^° ™ a « ™ arc alrcady stored by the backup 

^er indicates to 'the client' agents, prior to files being backed up that ^^^J^l^ (240) comprises a two-tier 
server Thm the clients do not need to send the redundant files to be backed up Tl e tape 6 _a *up PP ^ ^ on _, me 

da" toS system in which all backed up data is stored '« on-hne med.a sue. as a hanl wiUl danisms whereby a client 
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1 

DATA BACKUP AND RECOVERY SYSTEMS 



T echnical Field . 

Resent invention relates to computer data baekup and recovery and part.cularly, but not 

5 exclusively, to apparatus, methods and systems for enacting data backup and recovery. 
Back ground Art 

' Typl Ji y eomputer networks in use today have the bas.c topology shown m the d.agram » 
Figure 1 In Figure 1 , several client workstations are connected via a network to several servers. 
10 Users perfonn work on the client workstations that can commun.cate via a network hnk with 

the servers. Clients typically store unique user data whereas servers typically provide a common 
central point for some funcuon such as shared hard disk storage, data backup storage, software 
applications or printer control. 

There are a number of current schemes used for backmg up data on the network. A first 
15 scheme is for each client to individually store data to a dev.ee such as a tape drive. Tape storage has 
become the preferred method of backing up information on computers due to its relatively low cost, 
high capacity, durability and portability. This first scheme requires each person, or each admtmstrator, 
responsible for a client to be responsible for backing up the data. Also, each client required own 
backup dcv.ee. A second scheme is for each client to store .mportant data on a remote file server, 
20 where the server data is backed up at regular intervals, for example on a dai!y basis. Thus, tf a client 
fails potentially only less important information is lost. A third scheme, which is avauable, for 
example to systems which operate under Windows NT4 and are part of an NT Doma.n, ,s that all 
•specified' information on a client is backed up, typically overmght, to a tape dnvc connected to an 
NT server. 

25 Known backup schemes offer relatively good protect.on for client data, but with a very h.gh 

administration overhead. For example, if data needs to be recovered by a client, for example as a 
result of one or more files bemg 'lost' or destroyed, then the client's owner typically needs to contact an 
admm.stn.tor of the backup system and request a restore procedure. Typically, a restore procedure 
involves the backup system administrator tracking down and mount.ng a respective tape, on wh.ch the 
30 last backup of the lost f.le(s) was made, and imt.at.ng the restore procedure. While such a procedure 
is typically very reliable, it can be perceived as onerous on client users and backup system 
administrators alike. 

Disclosure of the Invention 
35 In accordance with a first aspect, the present invention provides tape storage apparatus, 

comprising: 

interface means for connecting the apparatus to one or more clients; 
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controller means for controlling the apparatus and for processing messages received from the 

one or more clients; 

primary storage means; and 
tape storage means, 
5 wherein the controller is programmed: 

,o p^ss ta** and ns.ore usages »M *- - - — * " 

from the primary storage means; and 

„ backup >o U* .p. —* * *cda»~ «i«h ^ cn-ena, * — rf 

„ ». da* aorcd in the pnn^ «»r maans and » _ ,o pnnwy means. » 

acOTtlm ce with . raspaenv. rcstoa* message --ved ton . M * W — *» — - <- 

tape storage means. 

PrafeWy. *= comolto mean, is progmmmad to n»in»,n stored on .h= pnmary «age 
m eans a, leas, 1. mos, v»ion of a» da„ received b. d. die*, k dus way, 
,5 option =» be enacted using d»a stored in prima,, aronrge. — » »" - , "» 

particular backup tape. r . „ A 

Preferably, the controller means is programmed to backup data stored m One pnmary storage 
means to the tape storage means independently of any messages from the clients. 

In preferred embodiments, the primary storage means comprises a random access storage 
20 means such as a hard disk dnve. Alternatively, the pnmary storage means comprises non-volattle 
mndom access memory (NV-RAM). For the latter case, however, the applicants belteve that currently 
NV-RAM would be very prohibitively expensive compared to a hard disk dnve. 

In preferred embodiments, the tape storage apparatus comprises a housing configured 
specifically to house the controller, the interface means, the pnmary storage means and the tape 
25 storage means. Thus, the tape storage apparatus provides a dedicated and integrated solutton to data 
storage and recovery. In other, less preferred cmbod.mcnts, the components of the apparatus may be 
d,stnbuted, for example, with some of the components reading on or in other apparatus, such as a 
computer. 

In accordance with a second aspect, the present invention provides a method of backmg up to 
30 a data backup and restore apparatus attached to a network data stored in one or more chents also 
attached to the network, the method comprising the data backup and restore apparatus stonng » 
pnmary data storage a most recent version of all data received from the clients and, from tune to tone, 
in accordance with predetermined criteria, stonng in secondary data storage at least some of the data 

stored in the primary data storage. 
35 In accordance with a third aspect, the present invention provides a data storage system 

comprising: 

a network; 
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tape storage apparatus as claimed in any one of claims 1 to 14; and 

at least one client connected to the apparatus, the (or the at least one) client comprising client 
storage means and client processing means, the client processing means being programmed m 
accordance with pre-dctermined criteria to determine when data stored in the client storage means 
5 should be backed up to the tape backup apparatus. 

Other aspects and embodiments of the invention will become apparent from the following 

description and claims. 

Brief Description of the Drawings 
10 An embodunent of the present invention will now be desenbed in more detail, by way of 

example only, with reference to the following drawings, of which: 
Figure 1 is a diagram of a conventional computer network; 

Figure 2 is a diagram of a computer network mod.fied to operate in accordance with the 

present exemplary embodiment; 
15 Figure 3 is a diagram illustrating the main functional features of a client accordmg to the 

present exemplary embodiment; 

Figure 4 is a diagram which illustrates the main file structures on the tape backup apparatus to 
facilitate a backup operation according to the present exemplary embodiment; 

Figure 5 is a diagram illustrating the main functional features of a backup apparatus accordmg 

20 to the present exemplary embodiment; 

Figure 6 is a flow diagram representing a backup operation according to the present exemplary 

embodiment; 

Figure 7 is a d.agram which illustrates file information gathered dunng the procedure 

described in relation to Figure 6; 
25 Figure 8 is a diagram which represents a redundant file elimination index; and 

Flgur e 9 is a block diagram of backup apparatus according to an embodiment of the present 

invention. 

Best Mode For Carrying Out the Invention. & Industrial Applicability 

30 As has already been discussed, hitherto known backup schemes and systems offer relatively 

good protection for network data, but with a potentially very high administration overhead, 
particularly when data restoration from tape is required. Administrators have the added problems of 
backups not being run, or failing due to lack of training or knowledge on the part of the client users. 
This is especially true of backup scheduling, media handling and maintenance schemes, such as tape 

35 rotation or drive cleaning. Failed backups can result in data loss, which, at best, causes time to be lost 
in recreating the data. 
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Abo as the a m ount of networked data increases, backup capacity and network bandwulth 
become secant limiting factors (takes longer to backup and thus increases network down tote 
durmgbackup),asdocsmemcr^ 

Const a 10 -client and 2 - server network where each client has 2 Gbytes of d.sk storage 
5 andeachserverhas^Gbytesofdiskspace. A tape storage device would potentially need to have a 
capacity of 60 Gbytes to guarantee to completely back up the network. 

Further, a typical 10-Base-T network can transfer at most 1 Mbyte of data per second. AfuU 
backup of the above network would take 60,000 seconds or 17 hour, During this time, the network 
would be unavailable for use for any other means. Tins would be unacceptable to most use*, 
10 The embodiments described herein aims to address at least some of these issues. 

Figure 1 is a diagram, which illustrates a genera! prior art networked computer environment 
100. As shown, a plurality of computer systems, or clients, (designated 1 10a, 1 10b, etc) are connected 
to a computer network 120. In this example, the network is a LAN (local are, network , such a, an 
Ethernet, winch supports the TCP/IP data communications protocol. Also connected to the LAN « . 
l5 numberofserver S (des 1 gnated 130a, 130b, etc). The server, may be, for example, . 

seers, email servers, or any combination thereof. In the diagram, a tape drive ,40. — 
connected to the server 130a, where the server 130a is a file server. Tne data stored on the file server 
,30a, in on-line storage (where on-line indicates that the storage is accessible at a given instance) such 
a, a hard disk dnve. is backed up to the tape drive 140 on a daily basis, typically to a new tape each 
day Tape media is termed off-line storage, since when a tape is removed it is no longer access.ble. A 
backup operation is scheduled and controlled by backup sofiware, winch runs as a background process 
on the file server 130a. The backup software schedules the backup to happen at, for example, 
llOOpm each night, and contro.s the transfer of data from the file server 130a to the backup apparatus 



as a 

20 
on 



25 ' Figure 2 is a diagram, which illustrates a networked computer environment 200 modified for 
operation according to the present embodiment. As shown, a plurality of client, (designated 210a 
210b etc) are connected to a computer network 220. In this case also, the network is a LAN (local 
area network), such as an Ethernet, which supports the TCP/IP data communications protocol. A 
number of servers (designated 230a, 230b, etc) are connected to the LAN 220. The servers, as above, 
30 may be file servers, print servers, email servers, or any combination thereof. Also connected to me 
LAN 220 is a tape backup apparatus 240 according to the present embodiment. The tape backup 
apparatus 240 is shown to include a hard disk dnve de.ee 242 in addition to a tape drive mecharusm 
244 The operation of the tape backup apparatus 240 will be desenbed in more detail below. 

In general terms, servers can be thought of as more powerful clients, or cfients with large 
35 amounts of disk storage. Thus, for the sake of convenience and unless otherwise stated, the term 
••client" when used hereafter shall be taken to mean other a server or a client in a network 
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en.—. Further, for ease of description only, the tenn "client" shall also be taken to include any 
device or apparatus which stores data locally which can be backed up remotely. 

Further, unless otherwise indicated, only one client system 210a will be considered, although 
it will be understood that the other clients operate in the same manner. 
5 The client 210a includes a backup agent 215a, which comprises one or more software 

routines. The main functional modules of the backup agent 215a arc illustrated in the diagram » 
Figure 3 Each module comprises one or more software routines, written for example m the C++ 
programme language, which control the client 210a to process data and communicate with the tape 
backup apparatus 240 as described in detail below. The software routines are stored on a hard Ask 
10 dnve dev.ee (not shown) in the client and are loaded into man, random access memory (RAM) when 
they are required to operate, and a centra, processor in the client processes the instructions to control 
the client to operate in aeeordanee with the present embodiment. The lines illustrated interconnect** 
the vanous modu.es and diagram bloeks present communications channels winch are open some or 
all of the time, depending on requirement. The client 210a is a general purpose computer, 
1 5 PC running the Windows NT 4.0 operating system 

DYNAMIC SCHEDULER MODULE 

In Figure 3, a dynamic scheduler 310 is a module responsible for dynamically initiating a 
backup cycle from the Cent 210a. based on time since the last backup and the local system resource, 
20 A local backup configuration Hie 312 contains details on a genera, network-wide pohcy set by the 
network administrator and a local user-defined policy, in terms of a target time delay before data * 
protected. For example, one default policy would be to attempt a backup once an hour. The dynarmc 
scheduler 310 is a background process which runs permanently on the client 210a. 

After the target time delay (e.g. 1-hour) has passed, the scheduler 310 assesses the .oca. chent 
25 system 210a resources to ensure that the backup can run without scnously impacting the system 
performance. If the loca. client system 210a is heavily loaded, for example at 95% capacity, the 
scheduler 310 will retry a short penod later (e.g. after 5 minutes) and continue retrying until the 
system has enough free resources to run the backup. There is an upper time limit to the retry, for 
example 30 - minutes, after which time a backup is forced irrespective of system loadmg. The upper 
30 time limit for retrymg is another general policy variable stored locally in the backup configuration file 
312. 

Once the local client system resources allow, the scheduler 310 communicates with the tape 
backup apparatus 240 to request a backup slot. The tape backup apparatus 240 will allow the backup 
job to start if the network bandwidth can support it. If there are already other backups running from 
35 other clients, which are using up all of the network bandwidth allocated to backups, the tape backup 
apparatus 240 communicates with the client 210a to refuse the request. As a result, the scheduler 310 
returns to the retry cycle and waits until the tape backup apparatus 240 gives permission to progress. 
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Because «K backup age,. 2,5a dyaramioully mm *» »* the over,,! »^rtc 

s * m = is «— ■ ■ — - >™ ° r • c r s - 

U. dy*™ sehodu te 3.0 also ha, respond for WM, 0" - *■ °*» 

modules when required. 

ACTIVE FILE MANAGER MODULE 

M «*. r„c manager modu,e (AFM, 320 moni.ors which flea - ,o be opened by ft. 

backup n S en, f« *. ^ * — ' * " ^ °" "° 

J 1 sys.em 322 ,o see if 6. » is aboody in . by ano.her P rogmm ronnmg on * *- 

I „ L is aba* ,„ ns, - A.M 320 wo* cm* «. f„e is in a W ~ 

u^, r.iP ;« "safe" does the AFM 320 allow the backup agent 215a 
aeent 2 1 5a to back it up. Only when the file is sate aoes mi 

U opeodref,,. A flic can be "safe" even if the file is locked and in use. ^ 
*0 Lup opembon, «. AFM 320 — iaaiiy psasccves A. da. so be backed up by s^ 
resp «dve blocks dueady ,o the .ope hookup appaso.us 240 oca, use neiw.sk. The up. boo** 
JLs 240 can fe. m»age or r.-order ft. 0». of order hookup b,.eks. drus present doe 
original SIOIC of thofde from when the backup slartod. 
20 For esample. consider , doubase of cusmme, addresses (no. shown,,, whrch ,s bang booted 

up Drurng doe backup, a user cbonges one of 0,0 ondios ■» a par, of ,be doubase doa, has no, ye, boon 
bled up- The AFM 320 immadia.e.y sends ,he old addross ,o 0,0 backup serve, and when, d,e 
backup reachos .his pom. ,, sh.ps ,he upda.ed address i„ .he da.ebase. This medmd d», w en . 
to bnae appliccion ,h,nhs i, bos wrd.en da, ,o d* ,. has indeed been wridc, ,0 ** - - 
25 euched somewhere else. Thos, d,c,e i, no possibility of dad, loss o, corrop.ion ,f 0* server -40 were 

to crash during the backup. 

Tine AFM 320 can be user-configured to determine when a file is "safe to backup, 
example, the AFM 320 can use a write inactiv.ty period to decide this, which would be one of One 
general policy values stored in the backup configuration file 312. In order to ensure that a backup 
30 copy of a file does not contain a partial transaction, the AFM 320 monitors the penod of time that 
passes without a write taking place. For example, if the time ,s set to 5 seconds, the file is not safe 
until there is a 5 second penod when there are no wntes active, and at this point the file can be backed 
up There is also a value for the penod of time after which the AFM 320 gives up trying to find a safe 
state. For example, if this time is set to 60 seconds, then the AFM 320 will try for one m.nutc to find a 
35 5 second period with no wntes. 

Some applications, notably databases, operate a number of files simultaneously (e.g. data files 
and index files) and to assure the overall integnty of such files they must be configured as a "group". 
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A group defines a number of files that must be backed up from a collective "safe" state, and only when 
every file in the group is simultaneously in a "safe" state can each file be backed up. This groupmg ts 
performed automatically by the AFM 320 when it detects one of the major database types (e.g. 
Exchange, Notes, SQL Server, and Oracle). Further, the AFM 320 may be configured to treat user- 
5 defined Ust of files as groups, with the group definitions being stored in the backup configuratton file 



312. 



FILE DIFFERENCING MODULE 

A file differencing module (FDM) 330 is a module in the backup agent 215a that selects the 
10 files to be backed up by determining which files have changed or been added smce the last backup^ 
Toe module ach.eves this by reading the current directory tree of the local file system 322 and 
checking each file's modified time/date against the entries in a cached Directory Tree F,le (DTF) 332 
generated from the last backup. Modified files will have different times and dates, and new files wUl 
have no correspondmg entry. Modified files are marked as "Mod.fied" and new files are marked as 
15 -New" Note that for the first backup after installation all files will be new files. 

Before the list of modified or new files is further processed, the list is filtered for excluded 
files (such as temporary files, Internet cache files, swap files, etc). The policy for excludmg files is 
held in the local backup configuranon file 3 12, and is generated from a general network pohey set by 
the network administrator and also from a user-defined set of excluded files or d.reetones. 
20 TV next stage is to determine which of the new files are already held on the tape backup 

apparatus 240, and are thus redundant. For example, if there has already been a backup of a 
Wl ndows95 workstation, then subsequent backups will determine that the Wmdows95 operatmg 
system files are redundant. The FDM 330 first sends the fist of the selected new files to the tape 
backup apparatus 240. The fist contains for each file a 32-bit CRC code (ca.eulated for the respective 
25 name date/ume stamp and file size information). The tape backup apparatus 240 returns a fist of the 
files that match Us existing backup file contents, and for each file it also returns a s.gnature (« th» 
case a 32-bit CRC checksum calculated over the actual file data) and an indication of the locauon of 
the file on the backup server. For each of the potentially redundant files in the list, the FDM 330 
generates a respective signature value and compares it with the value returned by the tape backup 
30 apparatus 240. Where the signatures match, the file marking is changed from "New" to "Redundant". 
Thus, the output of the FDM 330 is a list of all files wh.ch are new or modified since the last backup, 
marked as: 

"Redundant", copy already held on backup server; 
35 "New", new file, thus no need for block differencing; or 

"Modified", use block differencing to determine which blocks changed. 
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As well as files, the FDM 330 identifies any modifications to the system information used to 
rebuild the system in faster recovery. This covers areas such as NetWare NDS partitions^ 
partition information, file system types and details (e.g. compressed), bootstrap partitions (e.g. MBR, 
NetWare DOS partition). 

5 

BLOCK DIFFERENCING 

A block-differencing module (EDM) 340 determines which blocks in each file have changed 
since the last backup. The process of identifying the changed portions (deltas) of files is performed by 
two basic processes. The first process is a sliding fingerpnnt (SFP) process 342. In general, a 
10 fingerpnnnng process is a probabilistic algorithm, where the probab.lity of a failure is made far less 
than the probability of an undetected failure of the underlying storage and communication med.a (for 
further detailed information on fingerprinting, the reader is referred to the book by RichardKarp and 
Michael Rabin, "Efficient randomised pattern matching algonthms", Harvard University Centre for 
Research in Computing Technology, TR-31-81, Dec 1981). The second process involves active 
,5 detection of writes to regions of files; this technique requires a process called a file delta accelerator 
(FDA) process 345. The FDA process 345 is a background process which operates all tne tune to 
m onitor the client's operating system 312 wnte calls and maintain a log 347 of which logical regions 

of which files have been modified. 

The FDA process 345 is more efficient for files that are updated .n place (e.g. databases), 
20 while the SFP process 342 is far more efficient for document files that are entirely (or largely) 
rewritten with each update - although, only a small portion of the file may have been modified. As 
will be desenbed, the present embod.ment makes use of a combination of an SFP process 342 and a 
FDA process 345. As each mod.fied file is opened for backup, the FDA process log 347 ,s checked to 
see how much of the file has been modified. If more than a threshold percentage, for example 5-10 
25 percent, has been modified, and if the absolute size of the changes is smaller than a given s.ze (e.g. 2 
MB) then the SFP process 342 is selected as the appropriate process to use. Otherw.se, the FDA- 
detected reg,ons are used. Note that if the local cheat 210a crashes without a "clean' FDA shutdown, 
all FDA log 347 information is totally invalidated, so the BDM 340 must temporarily revert to the SFP 
process (or a conventional mcrcmental backup) when the next backup is performed. 
30 The SFP process 342 divides an updated file into equal-sized "chunks", the s,ze of which 

varies depending on the file size. Each chunk has a 12-bytc fingerprint calculated for it, and the 
fingerpnnts are sent with the backup data for the file to be stored by the tape backup apparatus 240. 
When a file is to be checked with the SFP process 342, the BDM 340 communicates with the tape 
backup apparatus 240 to download the fingerpnnt set for the file in question. It is also possible to 
35 locally cache fingerpnnt sets for files that arc frequently accessed. The SFP process 342 then 
calculates the fingerpnnt function for the updated version of the file, starting from the first byte and 
using a chunk s,zc the same size as for the last backup of the file. Then the SFP process 342 compares 



WO 99/12098 



PCT/GB98/02603 



the resulting new first fingerprint with the previous fingerprint set to a find a match. If there . a 
.natch, then the ehunk starting at that byte is already present on the tape backup apparatus 240, and 
thus need not be backed up. If there is no match, then the fingerprint function calculation » repeated 
but starting at the next (second) byte, and so on. 

5 For all files that are "Modified", the block differencing process is performed as described 

above, producmg a stream of modified chunks (plus new fingerprints) for each file. For "New" file, 
there is no need for block differencing, so the entire file is broken up into chunks (the initial chunk 
size depends on the file size) with a new fingerprint being calculated for each chunk. All these file 
chunks (plus new fingerprints) are sent to a data transfer module 350. descnbed below in more detad, 

10 to be compressed and sent to the tape backup apparatus 240. 

DATA TRANSFER MODULE 

A data transfer module (DTM) 350 performs the actual transfer of the backup data from the 
backup agent 210a to the tape backup apparatus 240. As chunks of backup data (plus fingcrpnnts) are 
15 received from the EDM 340, they are compressed and added to a backup stream of data for transfer to 
the tape backup apparatus 240. There is a delay between the transfer of each chunk due to the tunes 
^ to obtain the chunk and compress it, and the need to limit the client backup transfer data rate^ 
This method breaks the backup stream into small discrete pieces, thus making it less network 
bandwidth intensive. Tne selected data transfer rate and delay is determined by the tape backup 
20 apparatus 240, as will be described. 

' All the d.fferences in all the changed files since the last backup are stored in backup d.rectory 
files (BDFs) BDFs also contain a fingerprint for each respective file chunk, and RFE mdex 
information (date/time stamps, signatures, etc) for each changed file, which will be descnbed below. 

All backup data is indexed so that it can be reconstructed from the variousBDFs on the tape 
25 backup apparatus 240. Pointers to areas of the various BDFs are used for reconstruction purposes, and 
these pointers are held in the DTF, which indexes all the files on the tape backup apparatus 240. 

An exemplary DTF and associated BDFs, BDF1 and BDF2, are illustrated in F.gure 4. For 
example, with reference to Figure 4, consider the scenario in which a file, Filel, was originally backed 
up in BDF1 400, and then the first chunk, Chunkla, was modified and was stored in BDF2 405, as 
30 Chunklb. Then, the entry in the DTF 4.0 has a pointer, Pointerl, to BDF2 for the first chunk, 
Chunklb, and also a pointer, Pointed, to BDF1 for the unchanged chunks, Chunk2a and Chunk3a, of 
the file. Tnus, for a restore operation of File 1, File 1 comprises Chunk lb (in BDF2) and all chunks 

from Chunk 2a (in BDF1). 

For "Redundant" files, the entry in the DTF 410 is a copy of the pointer(s) to the already 

35 existing copy of that file on the tape backup apparatus 240. 

Every lime a backup is performed, a new DTF is generated. The new DTF is sent to die tape 
backup apparatus 240 and also cached to the local client system. Since only a small number of files 
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wil, typically have changed since the last backup, the new DTF can use pointers to the previous DTP 
for those parts of the directory that are unchanged. 

RESTORE MODULE , 
5 A restore module 350. performs restore operations using the DTFs to generate a d.rectory tree 

of all files that can be restored. The restore module 330 can either use the local cached copies of the 
DTFs (for recent views) or download other ones from the tape backup apparatus 240 (for older views). 
If the restore is being performed from a mounted tape (e.g. a histoncal archive) in the tape backup 
apparatus 240, then there will be a complete second set of DTFs to provide an alternate mstoncal 

10 restored, ^^to^n^r^^™^^^^*^*™' 
Since the DTF generated for every delta backup is a (virtual) complete list of all Ales, the user 
can change the restore view to an earlier backup and restore an older copy of a file. By default, the 
initial restore tree is from the latest backup. 

When a user selects to restore a specific file from a specific backup, the DTFs are used to 
15 identify which portions of which BDF contain the file data. This data is then copied from the tape 
backup apparatus 240 to the backup agent 215a, decompressed, and written to the specified location m 
the client storage. 

A directory tree of all files which can be restored is generated and viewed m a graphical user 
interface, for example an extension to the Windows Explorer program available on M.crosoft 
20 Windows 95 and Windows NT4. Tins provides a familiar and easy to use env,ronment for users to 
restore their backup data. This process covers restoration from the local client 210a. However, this 
does not apply to server data, particularly for NetWare, which does not have a graphical console. In 
this case, the server data restore tree would need to be available through a remote workstat.on console 
(not shown). There are two methods by which this could be done: 
25 * if a user logs in as a backup administrator in the tape backup apparatus admimstration 

interface, then display ALL server volumes for restore; or 

* alternatively, use the configured server drive mappings to indicate which server volumes to 
display in the restore directory tree. File security informal stored in theBDFs is used to filter the 
restore tree based on the user security used for each server drive mapping. 



30 



TAPE BACKUP APPARATUS 

According to the present embodiment, the tape backup apparatus 240 functionally compnses a 
set of modules each consisting of one or more control programs. The programs may comprise 
software routines, written for example in the C++ programm.ng language, but preferably compnse 
35 firmware stored in non-volat.lc memory such as read only memory (ROM), or hardware compnsmg 
application specific integrated circuits (ASICs), in the present embodiment, the tape backup apparatus 
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240 is in the form of a dedicated, networked appliance, as will be described below in more detail with 
reference to Figure 9. 

The tape backup apparatus 240 comprises two-levels of data storage for backup data. The first 
level of data storage is on-line, random access storage, in the form of a hard disk drive device 244 of 

5 sufficient capacity potentially to at least store all data from all local client storage. From this hard dtsk 
drive device 244, any client can restore any of its most recently backed up files or whole file system, 
without having to address any tape storage. Hitherto, tape backup systems known to the apphcants 
re ,y on tape backup as the first, and typically only, level of backup. The second level of storage 
compnses off-site, off-line tapes, which are removed from the tape backup apparatus 240 by the 

10 system administrator. Data on a tape can be accessed by a client once the tape has been re-loaded 
m0 unted into the tape backup apparatus 240, since the tape can be 'mounted as a vo.ume of the file 
system of the tape backup apparatus 240. Of course, data recovery from tape will always take longer 

than data recovery from on-line storage. 

The tape backup apparatus 240 according to the present invention provides extremely 
15 convenient access by a client to recover one or more lost files, which have been backed up, without the 
need to find an archived tape. Typically, tape-based backup systems use a different tape for each day 
of the week, and thus require use of a part.cular tape for the day on which the data was last backed up 
to restore any data which has been lost. Tnis means that generally tape-based backup systems must 
regularly (e.g. once a week) repeat a full backup of all data (including unchanged file that have already 
20 been backed up) to prevent needing an unmanageable number of tapes to restore a file or system. The 
present tape backup apparatus 240 mamtains in on-line storage 244 an instance of every backed up 
elient file and any or all files can be restored at any time by a client-initiated process. Th.s also means 
that there is no longer any need to repeat a backup of unchanged files - only the changes are sent after 

the first backup. ? 
25 The major functional modules in the tape backup apparatus 240 will now be desenbed m 

association with the functional block diagram in Figure 5. 

BACKUP DYNAMIC SCHEDULER 

A backup dynamic scheduler 500, or backup scheduler, for the tape backup apparatus 240, 
30 works in conjunction with the dynamic scheduler 310 in the client backup agent 215a. The role of the 
backup scheduler 500 is to control the flow of backup jobs in order to control the network bandwidth 

used by the backup data traffic. 

The backup scheduler 500 varies the active jobs so that the amount of backup traffic over the 
network is •throttled' (or restricted), and is kept within a defined bandwidth such as 5%. Thus, the 
35 backup traffic will be guaranteed never to use more than 5% of the network bandwidth. If there arc 
too many backup jobs or too much changed data then the time to complete the backup jobs will 
extend. Thus the tape backup apparatus scheduling of the backup jobs may mean that data cannot 
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always be protected within the target period (e.g. 1 hour). The parameters and their respective 
thresholds, which are used to determine whether a backup operation can be allowed, are stored « a 
backup configuration file 504. There are two basic methods that can be used to throttle the backup 

5 , ™ ffiC ' 1 Each backup agent transfers the backup data at a specified controlled rate (e.g. 50KB/sec) 
by adding artificial delays between small backup data blocks. For example, a single client sending 
16K blocks with 200ms delays between blocks uses 5% of available network bandwidth for a lOMb.t 
Ethernet. 

2 Each backup agent, when it is active, bursts the backup data, aiming to complete the 
10 backup in a short time. However, the Sl ze of the backup data blocks needs to be limited (e.g. to 16K) 
so that the backup does not use all available network bandwidth. A single client streammg 1 6K blocks 
uses approxhnately 25% of the available network bandwidth, and two streammg clients use 45% etc. 
The throttling will , then sequence the jobs, so that only a small number (e.g. 2) are active 
simultaneously, and add large delays between jobs so that the overall average bandwidth used .s 5%. 
15 The backup scheduler 500 also includes a prioritisation scheme based on the tune jobs have 

been waiung to start, the estimated amount of data sent in each job, and the available network 
'bandwidth The prioritisation scheme variables are stored in a prioritisation file 502 on the tape 
backup apparatus 500. For example, if a backup request from a backup agent is refused due to 
insufficient network bandwidth, the time of the first refusal is logged, and subsequent requests are 
20 compared with other outstanding requests against the length of ume since the first refusal. The job 
that has been waiting the longest will be started first. An adaptive algorithm in the backup scheduler 
500 that Mearns' the average job size from the system, by averaging over all jobs received u> a fixed 
time period of, for example, one week, can determine the estimated size of the jobs. 

The backup scheduler 500 also adapts to the network cond.tions so that if the network 
25 consistently has much more than 5% available bandwidth, then the backup scheduler will sequence 
backup jobs to use more network bandwidth (e.g. 10%) during slack periods. 

The tape backup apparatus administrator may configure the backup scheduler 500 to give 
prionty to the backup jobs at the expense of the network bandwidth, in which case the job sequencing 
priorities are assigned based on the time since the last backup rather than network bandwidth. 

30 

REDUNDANT FILE ELIMINATION MODULE 

A redundant file elimination (RFE) module 510 maintains an index, the RFE index 512, in the 
tape backup apparatus 240. The RFE index 5 1 2 is a database which is cither held in memory or on the 
hard disk drive device 244, listing all the files held in the primary storage. The index 512 is used by 
35 the backup apparatus 240 to determine whether files requested to be backed up by the local client 215a 
are already backed up by another client and, therefore, do not need to be backed up again. The RFE 
index 512 holds a file record for each file stored. Each file record, as illustrated in Figure 8, only takes 
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lppro xima,«ly 2 5 bytes ,4 byres of f„. ID. 4 bynes of efien, ID. 8 byes of file s,ae. 4 yte, - « 
(cLaUd over the tWsize/modified M and too «,! 8 bye, of file stgnahsre). » ft. oven 
with millions of files to store the RFE index memory requirements are not excessive. 

The backup agent 2,5o aem* to the ape haekup apparent 240 a lis, idenfifyrng new files ro 
5 be hoaxed up. The lis, only eontains rhe for, byrea of CRC for eaeh file, where one CRC eonmu. 
suffi e,en, infonoadon to allow file eompaaiso. and idondfieadon. Using the CRC tnformahon end. 
fi ,e „ dte list is compared with enrries in the RFE index 512 for a mateh. If any matches are found, 
the ape backup appanr.ua 240 returns a lis. of these files, plus their stgnatures and loexttons, ax 
xheody mennoned above. The backup agen. 2,5. compares the rehnned srgnaturett wtdt suture* 
10 genenrted for its loeal files to determine if the files are exaetly the some. 

BACKUP STORAGE MODULE 

Each backup eg* 2,5 sends its backup dau and the most reeenr DFT to the ape backup 
apptmttus 240 for storege by the hookup storrtge module (BSM) 520. The hookup dau oompnses the 
,5 stream of file data to be hookup up. A backup storage module 520 store* dte filea in on-lrne meaha 

including: 

♦ Full area 524 (which holds the baseline full backup and the respective fingerprint data) 

20 * Delta area 526 (which holds changes since the baseline full backup and respective 

fingerprint data) 

* Merge area 528 (which is the workspace used during merge of changes ,nto new baselme 
full backups) 

25 The hard disk drive device 244 also holds working files such as the pnoritisation file 502 and 

the backup configuration file 504, and, where applicable, Ac RFE index 512. The baseline full 
backups in the full area 524 consist of a single backup data file and director tree file for each Cent 
215 and these are used as a baseline for the delta backup data. For example, if a delta is the first block 
of a file, then to restore this file the first block is obtained from the delta backup and the rest is 
30 obtained from the baseline full backup. The full backups are initialised when the first backup ,s 
performed on the client system. However, if the initial full backups were then left untouched, over 
time there would be an unmanageably large number of delta backups (which would impact storage 
space and restore time). Thus, there must be a regular update of the baseline full backups by merg.ng 
the delta data into a new baseline full backup, as will be described below. 
35 The delta area 526 contains the delta backups from each client. Delta backups also each 

compose a backup data file and a directory tree file. Note that the d.rcctory tree files are themselves 
deltas on the baseline full backup directory tree file. 
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The merge area 528 is used for merge operations. At pre-determined regular intervals, for 
example every month, there is a space-saving operation to merge the oldest delta backups for each 
client into a new baseline full backup. Also, there are regular space-saving operations to merge the 
hourly delta backups for a 24-hour period into a single delta backup, as described below. 

5 

MERGE CONTROL MODULE 

A merge control module MCM 530 is responsible for merging multiple delta backups together 
with either the baseline full backup or with other delta backups. The purpose of this is to reduce the 
amount of on-line capacity used by deltas, while maintaining a reasonable history of changes so that 
,0 the user can restore files from up to at least a week ago. Without this merge function, there would be 
at least 10 deltas for each day (hourly backups). 

The MCM 530 can be configured by the server administrator with merge criteria to suit the 
network environment. Tne enteria are stored in the backup configuration file 504. For example, 
keeping hourly backups would not be productive beyond one day. Therefore, one possible default 
15 criterion is to merge the last 24 hours' deltas into one daily delta, for example at 11:59pm each day. 
Another possible scenario is for the MCM 530 to reduce the number of daily deltas, at the end of four 
weeks by merging the oldest two weeks of deltas into a new baseline full backup. In thus example, 
whenever the user requests a restore view, they have the capability to view hourly history for the 
current day and at least two weeks of daily history. 
20 If the backup storage delta area 526 reaches a pre-determined threshold, for example 95% 

capacity, the MCM 530 overrides the merge criteria and performs an immediate merge of the older 

deltas into the baseline full backup. 

Another function of the MCM 530 is to delete files from the baseline full backup when they 
have been deleted from a respective client system and are older than a predefined, user-configurable 
25 period such as one month. The MCM 530 regularly (e.g. weekly) compares the directory tree files of 
the baseline full backup with the directory tree files of the delta backups. Since directory tree files 
contain a 'snap-shot' of all files present in the file system at the time, deleted files will be present in the 
full backup but not in any of the delta backups. After identifying any deleted files, if these files are 
older than a predefined period (e.g. one-month) then they are removed from the baseline full backup. 
30 In this way, the storage requirements of the baseline full backup are not dramatically increased by old 
deleted data. It is still possible to keep copies of old data in archives by using offs.te tapes generated 

by a tape backup module. 

A further feature of the MCM 530 is to detect when multiple pointers, from different clients, 
point to a s.ngle baseline file entry. In this case, a standard merge operation cannot occur, to merge a 
35 delta for one client with the baseline entry, otherwise the entry would be wrong for the other client(s). 
The means to overcome this, applied by the MCM, is not to merge the baseline entry with any deltas, 
unless all the deltas are the same. One alternative would be to carry out the merge for one client but 
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modif y the deltas for the othercHent. Another alternative would be create a new baseline entxy for the 
other elient(s) and then merge the delta with the baseline entry for the first-mentioned ehent 

TAPE BACKUP MODULE 
5 As already described, initially all the selected backup data is sent from the backup agents 215 

to on-line storage (hard disk dnve device) 244 on the tape backup apparatus 240, and is available for 

^ediaterestorefromthatmed,, However, this does not provide a complete backup solubon « 

backup data is still susceptible to a disaster since it may still be on the same site as the ongmal data. 

Therefore in accordance with the present embodiment, a regular copy of the tape backup apparatus 
10 data is made to removable storage (in this case tape), so that it can be taken offsite. The backup to 

tape step is made on the bas.s of prc-detcrm.ncd cntena independently of any message rece.vcd from 

clients or elsewhere. 

A tape backup module (TBM) 540 provides tins capab.lity. The TBM 540 cop.es the tape 
backup apparatus', on-line data on the hard chsk dnve dev.ee 242 to tape med.a at a pre-detemnned 
15 ouster-scheduled time and day/date. The TBM 540 copies a mirror .mage of the blocks of data on 
the hard d.sk dnve dev.ee 244 to tape. Such block-level copymg al.ows large blocks of data to be read 
directly from the d.sk, rather than rcadrng each file one at a time through the file system. Thrs 
impro ves the data rate from d.sk to tape and thus ensures that the tape is kept constantly supphed 

(streaming) with data. . 

The tape backup operat.on necessanly runs while the tape backup apparatus 240 .s Still 
acuvely able to accept and manage backup data from the cl.ents. For this reason, the TBM 540 
.corporate* active file manager technology, which is desenbed above, to prevent backup corrupt™. 

The backup adm.mstrator can schedule the generate of an offsite backup tape by spec.fymg 
the day/date and the time of day they want the tape. This configurat.on .nformarion is stored ur the 
25 backup conf.gurat.on file 504. In other words, the admm.strator can configure the TBM 540 to 
produce a complete tape copy of all backup data at a convenient time, for example, to take the tape 
with them when they leave work at 5.30pm each weekday. The TBM 540 calculates the start tune of a 
tape backup based on the administrator's completion time, the amount of data held on the backup 
server and data transfer rate of the tape backup apparatus. 
, 30 " ' The default offsite tape backup schedule is every weekday at 5.30pm. If a tape is loaded in 
the drive at the start time of the backup, it will automatically be overwritten with a new full backup. 
At the end of the backup, the tape is automatically ejected from the tape backup apparatus so that u ,s 
clearly finished and ready to be taken away by the administrator. 

S.nce deleted files more than one month old will be removed from the baseline full backup, 
35 the offsite tapes can be used as archive storage for such files. By saving an offsite tape at regular 
intervals (for example at the end of each week) the user can archive the backup data in case any old 
deleted files are ever required in the future. Also, there may be legal requirements to keep backup data 
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for a period of year, There should be , ***** »P*« <° — «* — - j" 1 * med ' a 
for live ptnposes. »o ** «h« regdar tepe brreknp «» no. overwme ape* 

S JeTe tnpe baeknp tnedia ™, be used aa arehives, there rnus, be , for -ho aefcrp 

agon, 215. » aeeess an anrhive tape to, restonng to. The «.y - *» * » "»°" 11,5 

,i_ ,,„, 1« as a read-only disk volume, and thus provide aoeess to the 
5 backup apparatus image on the tape 242 as o reaa only U 

arcluved directory tree files and backup data files on the tape. To speed up the generate of te 

.store tree, the tape cop.es of the directory tme files can be cop.ed onto the tape backup apparatus 

hard disk dnvc de.ee. After the archive tape is successfully mounted for restore access, the restore 

tree views in Windows Explorer provides an archive restore tree, as desenbed above. 

10 

DISASTER RECOVERY MODULE 

When a client system 210a completely fails, the tape backup apparatus 240 can be used to 
restore the comp.ete data envfronment onto a new replacement or repaired system. All of the files are 
held on the on-line storage 244 of the tape backup apparatus 240. A disaster recovery module (DRM) 
l5 550 recovers requested files from the baseline full backup for the client and any deltas. Tne new 
system must have the appropriate operating system installed, and then install the backup agent to 
communicate with the tape backup apparatus 240. Tne restore m odu.e 350 of a client is mtttated b y ^ 
administrator to communicate with the DRM 550 and copy back all of the data from the last backup 
(in effect a reverse RFE). 

20 The DTFs on the tape backup apparatus 240 are used to determine the state of the system to be 

recovered. Tt.crc are also options to select an older system state to restore for the d.saster recovery by 
using previous delta versus of the directory tree files, which would be used if the latest state of the 

system were corrupted. , . 

Due to the fact that large quantifies of the data need to be transferred over the network to 
25 recover a complete system, there is also an opt.on to schedule the d.saster recovery operanon. Smc* 
the recovery is performed from the on-line storage 244 in the tape backup apparatus 240, there ts no 
user intervention requ.red and the recovery can proceed unattended at any time. 



BACKUP OPERATION 

30 A bas.c backup operation from the client 210a to the tape backup apparatus 240 will now be 

desenbed with reference to the flow diagram in Figure 6, wh.ch splits client s.de and tape backup 

apparatus side operations. 

For the backup agent 215a, the dynam.c scheduler 310 schedules a backup operat.on, in step 
600 on the basis of the time lapsed since the last backup and/or the amount of new data, chent 
35 loading and'or network loading. When the criteria for the backup operanon are met, the dynam.c 
scheduler 310 issues a request, in step 605, to the tape backup apparatus 240 for a backup slot. The 
tape backup apparatus 240 receives the request, and the backup scheduler 500 checks the tape backup 
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apparatus loading and network loading, in step 610, and accepts or rejects the request. If rejected, the 
dynamicschedulerSlOmake.furtherrequestsuntilarcquestisaccepted. 

Once a request is accepted by the tape backup apparatus 240, « step 615 the FDM 330 

5 added since the client was last backed up. , A( . fivP 
By way of example, assume that the client 210a only stores five files. The statuses of the five 
exemplary files stored by the client 210a are illustrated in Figure 7a. Obviously, in practice, the 
number of files will be much larger. 

10 3 and 7 and a respective modified date and time stamp, and File 2 has a modified block 5 and a 
restive modified date and time stamp. Tnese modified files have already been backed up at least 
once The modified portions of the files are highlighted in the diagram using thick fines. References 
to the modified blocks are stored in the FDA log 347. File 3 is an existing file, which has not been 
m odified since the last backup, and File 4 and File 5 are new files. As shown. File 4 is a -common 
1 5 file, which has an exact copy already on the file server 240 for another client, and File 5 - . Wque 
file which is new to both the client 210a and to the tape backup apparatus 240. Although the stages 
of me new files are shown in Figure 7a, this is purely for ease of explanation herein and it wfil be 
appreciated that in practice the client 210a has no advanc information about whether the tape backup 
apparatus 240 already contains versions of new files. 
20 The first list, although not specifically shown, includes all files illustrated in Figure 7a except 

File 3 since File 3 has not been modified. Having built the first fist, in step 620 the FDM 330 
eompiies a second file list for new files, including File 4 and File 5, as illustrated in Figure 7b. As 
shown the second list contains for each file only a respective 4-byte CRC (calculated over the name, 
date/time stamp and file s,ze information). The second fist, comprising CRCs, is then transmitted to 
25 the tape backup apparatus 240 in step 623. The 4-byte amount of information per file minimises the 
network bandwidth required to send the information to the tape backup apparatus 240, while at the 
same time providing enough information for comparison purposes. 

The tape backup apparatus 240 receives the request, in step 625, and the RFE module 510 
eompares the second file list with the entries in the RFE index 512 to find matching files which are 
30 already stored on the tape backup apparatus 240. 

An exemplary RFE index is illustrated in Figure 8. The RFE index 512 in Figure 8 includes 
entries for three clients: Client 1 10a, Ghent 1 10b and Ghent 1 lOn. Also shown is a representation of 
the tape backup apparatus on-line storage 242, representing the arrangement of files stored therein m 
very simple terms, for ease of understanding only (that is, the construction of files in the on-hne 
35 storage 242 is not shown in terms of DTFs and BDFs). Figure 8 also shows the association between 
each file reference in the RFE index 512 with the files stored in the on-line storage 242, although it 
will be appreciated that there is no physical association, such as pointers, stored by the RFE index 512. 
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In this case, File 4 is a common file (in this example, it was initially stored for Chent 1 lOn), 
and File 1 is also shown as being a common file (in this example, it was initially stored for Chent 
HOb) There is only one entry for each common file in the RFE index 512, where the entry ,s 
associated with the first client which introduced the file to the tape backup apparatus 240. 
5 Returning to the flow diagram, in step 630, the RFE module 510 compiles and returns a third 

,ist of files, and respective signatures and pointers, for the files that have RFE index entries, as shown 
in Figure 7c. Thus, File 4 only is included in the third list, where File 5, being a unique file, ts 

ign ° red 'me backup agent 215a receives the third list, in step 635, and the EDM 340 calculates a 
l0 signature for each file stored on the.client 210a which appears in the lis, In step 640 the calculated 
signatures are compared with the respective received signatures (in this case there is only one 
calculated and one received signature for File 4) to confirm which files are already stored on the tape 
backup apparatus 240 (i.e. which files are redundant), and thus do not need baclang up. 

Next, for each modified file (File 1 and File 2), the BDM 340 determines which speafic parts 
15 of the files are different, in step 645. For this operation, the BDM 340 communicates with the tape 
backup apparatus 240 to retneve the respective fingerprint information, as illustrated by step 645a. 

In step 648 the DTM 350 builds a fourth list, as illustrated in Figure 7d, which shows Fuel, 
File 2 File 4 and File 5. This list comprises at least some information for all new and mod.fied files, 
data included with each entry in the fourth list are:- file name; mod.ficd date/time stamp; file 
20 differences (for mod.ficd files) or entire file data (for new and non-redundant files); signature; and 
pointer (if the file is a new, redundant file, the pointer is included to indicate where on the tape backup 
apparatus 240 the file is already located). 

Then, in step 650, the DTM 350 transmits the fourth list, as a backup data stream to be backup 

up to the tape backup apparatus 240. 
25 Finally, in step 655. the BSM 520 receives the data and arranges and stores the data in the tape 

backup apparatus 240. 

Tne above-described process outlines the steps required for a simple backup operation 
according to the present embodiment. The process can be vaned or unproved upon without movmg 
away from the scope or the essence of the present invention. For example, more complex RFE 
30 procedures may be applied to cope with partial file redundancy, where the tape backup apparatus 
recognises that new files from one client are only slightly different from existing backed up files. As a 
result, only the differences between the new files and the already-stored files need to be backed up. 

The diagram in Figure 9 is a block diagram which illustrates the components of an exemplary 
tape backup apparatus according to the present invention. 
35 In Figure 9, the tape backup apparatus is referenced 900 and includes a interface 905 for 

transmitting data between the tape backup apparatus 900 and one or more clients (not shown). The 
interface 905 may compose, for example, a local area network adapter, if the apparatus is configured 
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,o be attached directly to a network, or a SCSI (small computer system interface) adapter, if the 
apparatus is configured to be attached directly to a compute, Thus, one or more clients can address 
the tape backup apparatus either directly across the network or via the computer. 

In the tape backup apparatus 900 a controller 910 controls the operation of all components of 
5 the backup apparatus 900, is response for processing messages received from clients and is 
responsible for managing the data .movement within the apparatus, for example between hard d.sk 
drive device 920 and tape 940. The controller communicates with the other components of the 
apparatus via a system bus 912. The controller 150 typically comprises a microprocessor, for example 
a Motorola 68000 senes uniprocessor or an Intel 80386 microprocessor. The operation of the 
10 controller is determined by a program comprising firmware instructions stored in ROM 915. Main 
memory 913 comprising RAM is accessible by the controller 910 via the system bus 912. 

The hard disk drive device 920 is connected to the interface 905 such that it can receive or 
send client data from or to the interface 905. In a particularly preferred embodiment, the tape backup 
apparatus 900 includes further functionality to compress data before storing it on the hard disk drive 
15 device 920, thereby reducing the storage capacity requirement thereof. Many well-known 
compression algorithms may be used, for example a Lempel-Ziv substitution algorithm. 

A read/write processor 925 is connected to the hard disk drive device 920. The read/wnte 
processor 925 is arranged to receive data from the hard d,sk drive device 920 and convert it into a 
form suitable for driving read/write heads 930 of a tape mechan.sm 935 for storage of the data to tape 
20 media 940, or to receive data from the tape med.a 940 and convert it into a form suitable for storage on 
the hard disk dnve device 920. Additionally, the read/write processor 925 includes error 
correction/detection functionality, winch uses, for example, Reed-Solomon encoding, winch is well- 
known in the datn storage art. The tape media 940 is mounted in the tape mechan.sm 935, which loads 
and ejects the tape media 940. winds the tape media forwards or backwards as required and actuates 
25 the read/write heads 930 as approbate. For example, the tape heads may be mounted on a rotatmg 
drum, which rotates at an oblique angle to the travel of the tape, such as in a well-known DDS (Digital 
Data Storage) tape drive. Alternatively, the tape heads may be mounted for perpendicular movement 
in relation to the travel of the tape, such as in digital linear tape recording technology. 

The interface 905 and read/write processor typically each compose one or more appropriately 
30 programmed application-specific integrated circuits (ASICs). 

The components of the tape backup apparatus in its preferred embod.mcnt are housed in a 
single housing (not shown) for convenience, thereby providing a dedicated data backup and restore 
apparatus and solution. The apparatus can be thought of as a novel tape drive comprising extra 
functionality and a large, non-volatile data storage facility. In practice, the non-volatile storage must 
35 have a capac.ry equal to or, most preferably, greater than the capacity of tire combined local storage of 
all clients using the apparatus as a backup solution. Equally, the tape storage capacity of the apparatus 
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m us, * ^a, to or greater ttt of d. non-voUfflc srorage. In boda eases, the srorage «np««y 

ft dedicaud data backup »d reslom »PP»,«s can be »*«d by a sysuam ad~r, 
ft, e^nple for me pun- of selbng the daily «. for backup of dam I. «. hard « 

5 device .0 tape, b, logghtg - - *• * *** *— ™ * "— 

wmksUhon console and configuring Ure appandus using » appropriate use, mterf.ee. 

Them is „o reason iu praeric. «hy fte components nf.be apparatus could no, be n«d » 
mom MMd manner. For eaaruple, dr. cooler couid be embodied as a stmadard compuK* 
^ such as a PC, running appropriate .*» Tfre PC would Ihen need u, suppm, appcopna* 

„ L« devdees (band drsk drive device and tape drive). » tbis way. a ^ « 

solutiOT couid be achieved, no a fan .ess convenient nranncr. In pariicuiar, ,» a d.smbut.d soluhon 
L component wnuld need ,o be mdfvddually added to dro PC and ~ ^ 

need .0 be laken whenever further component or devices we* added ,o the PC. or the PC s 
configuration was changed. . , 

,20 ma, be rcpiac.d by an alternative, sucb as a solid sute device. An example would be NV-RAM 
M« RAM), which presently can comprise SRAM (stabc RAM) with iu - power 
= , a bartcry). or EEPROM (electrically ecaaablc programmable ROM). Such ahemahves w„„,d 
JL» be tcchnicaUy possible, but, it in behaved, no, Onancially vdahle. 
20 media may be .placed by an altcmauve rem.vnbld medio, for example, an appropnat. NV-RAM 

^"'Tmer variations and improvement will become apparent to the skilled person on readmg the 
present description. 
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CLAIMS 



1. Tape storage apparatus, comprising: 

interface means for connecting the apparatus to one or more clients; 
5 controller means for controlling the apparatus and for processing messages received from the one 

or more clients; 

primary storage means; and 

tape storage means, 
wherein the controller is programmed: 

to process backup and restore messages received from the one or more clients respectwely to 
backup to the primary storage means data received from the clients and to restore to said clients data from 

the primary storage means; and 

to backup to the tape storage means, in accordance with pre-defined criteria, at least some of the 
data stored in the primary storage means and to restore to the primary storage means, in accordance wnh a 
,5 respective restore message received from a client, at least some data stored in the tape storage means. 

2 Apparatus according to claim 1, wherein the controller is programmed to maintain stored on the 
primary storage means at least the most current version of all data received from the chents. 

20 3 Apparatus according to either preceding claim, wherein the controller means is programmed to backup 
data stored in the pnmary storage means to the tape storage means independently of any messages from 
the clients. 

4. Apparatus according to any one of the preceding claims, wherein the pnmary storage means comprises 
25 a random access storage means. 

5. Apparatus according to any one of claims 1 to 3, wherein the pnmary storage means comprises non- 
volatile random access memory. 

30 6 Apparatus according to any one of the preceding claims, wherein the controller is programmed by 
instructions that are stored in non-volatile memory of the controller, said instructions betng read and 
processed as required by the controller. 
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7 Apparatus according to any one of the preceding claims, further comprising a housing configured 
specifically to house the controller, the interface means, the primary storage means and the tape storage 



means. 



5 8. Apparatus accenting to any one of the preceding claims, not provided wnth interface means for either 
or both a keyboard and visual display unit 

9. Apparatus according to any one of the preceding claims, wherein the controller is programmed for 
storing in the primary storage means baseline backup data and delta backup data. 

10 10 Apparatus accordmg to claim 9, wherein the controller is programmed for incorporating the delta 
backup data into the basehne backup data, to form new baseline data, in accordance with predetermined 
criteria. 

,5 11 Apparatus according to any one of the preceding claims, wherein the controller is programmed for 
receiving from a client a backup request message and for responding in the negative if either or both of the 
apparatus loading and the network loading exceed respective predetermined limits. 

12 Apparatus according to any one of the precedmg claims, wherein the controller is programmed for 
20 receiving from a client a message including an mdication of the particular data the client washes to 

backup, and for responding by indicating wluch (if any) of the particular data already has a versron stored 
in the primary storage means. 

13 Apparatus accordmg to any one of the precedmg claims, wherein the controller is programmed to 
25 respond to a request message received from a client to restore to the client particular data, compnsmg 

combining respective delta and baseline data stored in the primary storage means to form the particular 



data. 



14. Apparatus accordmg to claim 13. wherein the request message from the client requests restoration of 
30 particular data that is not the most recent version thereof backed up by the client, further comprising 

combining data stored in the tape storage means with the respective delta and baseline data to form the 
particular data. 

15. A method of backing up to a data backup and restore apparatus attached to a network data stored in 
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one or more clients also attached to the network, the method comprising the data backup and restore 
apparatus storing in primary data storage a most recent version of all data received from the clients an* 
from time to time, in accordance with ^-determined criteria, storing in secondary data storage at least 
some of the data stored in the primary data storage. 

5 

16. A data storage system comprising: 
a network; 

tape storage apparatus as claimed in any one of claims 1 to 14; and 

at least one client connected to the apparatus, the (or the at least one) client comprising client 
,0 storage means and client processing means, the client processing means being programmed in accordance 
with pre-determmed cnteria to determine when data stored in the client storage means should be backed 
up to the tape backup apparatus. 

17 A system according to claim 16, wherein the client processing means comprises means to form and 
15 transmit a message to the tape storage apparatus, the message including a request to iniuate a backup 

operation. 

18 A system according to claim 16 or claim 17, wherein the client processing means comprises means to 
schedule a backup operation, means to select data in the client storage means to be backed-up. and means 

20 to transmit data across the network for storage by the tape storage apparatus. 

19. A client configured for operation in a data storage system as cla.med in any one of claims 16 to 18. 

20. Data backup and restore apparatus, comprising: 

25 interface means for connecting the apparatus to one or more clients; 

a controller; 

a primary storage means; and 
a secondary storage means, 
wherein the controller is programmed: 

to process backup and restore messages received from the one or more clients respectively to 
backup to the primary storage means data received from the clients and to restore to said clients data from 
the primary storage means; and 



30 
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to backup to the tape storage means, in accordance with pre-defined criteria, at least some of the 
data stored in the primary storage means and to restore to the primary storage means, in accordance with a 
respective restore message received from a client, at least some data stored in the tape storage means. 
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